Quantitative Big Imaging

Kevin Mader, Christian Dietz
19 February 2015

ETHZ: 227-0966-00L

Introductions and Workflows

Overview

  • Who are we?
  • Who are you?
    • What is expected?
  • Why does this class exist?
    • Collection
    • Changing computing (Parallel / Cloud)
    • Course outline
  • What is an image?
  • Where do images come from?
  • Science and Reproducibility
  • Workflows

Who are we?

  • Kevin Mader (mader@biomed.ee.ethz.ch)
    • Lecturer at ETH Zurich
    • Postdoc in the X-Ray Microscopy Group at ETH Zurich and Swiss Light Source at Paul Scherrer Institute
    • Spin-off 4Quant for Big Data with Images

Kevin Mader

  • Marco Stampanoni (marco.stampanoni@psi.ch)
    • Professor at ETH Zurich
    • Group Leader for the X-Ray Microscopy Group at ETH Zurich and Swiss Light Source at Paul Scherrer Institute

Marco Stampanoni

Who are we (continued)?

  • Anders Kaestner (anders.kaestner@psi.ch)
    • Group Leader at the ICON Beamline at the SINQ (Neutron Source) at Paul Scherrer Institute

Anders Kaestner

Who are we (continued)?

  • Filippo Arcadu (filippo.arcadu@psi.ch)
    • Exercise assistance
    • PhD Student in the X-Ray Microscopy Group at ETH Zurich and Swiss Light Source at Paul Scherrer Institute

Filippo Arcadu

Christian Dietz

Who are you?

A wide spectrum of backgrounds

  • Biomedical Engineers, Physicists, Chemists, Art History Researchers, Mechanical Engineers, and Computer Scientists

A wide range of skills

  • I think I've heard of Matlab before \( \rightarrow \) I write template C++ code and hand optimize it afterwards

So how will this ever work?

Adaptive assignments

  1. Conceptual, graphical assignments with practical examples
    • Emphasis on chosing correct steps and understanding workflow
  2. Opportunities to create custom implementations, plugins, and perform more complicated analysis on larger datasets if interested
    • Emphasis on performance, customizing analysis, and scalability

Course Expectations

Exercises

  • Usually 1 set per lecture
  • Optional (but recommended!)
  • Easy - using GUIs (KNIME and ImageJ) and completing Matlab Scripts (just lecture 2)
  • Advanced - Writing Python, Java, Scala, …

Science Project

  • Optional (but strongly recommended)
  • Applying Techniques to answer scientific question!
    • Ideally use on a topic relevant for your current project, thesis, or personal activities
    • or choose from one of ours (will be online, soon)
  • Present approach, analysis, and results

Literature / Useful References

General Material

  • Jean Claude, Morphometry with R
  • John C. Russ, “The Image Processing Handbook”,(Boca Raton, CRC Press)
    • Available online within domain ethz.ch (or proxy.ethz.ch / public VPN)
  • J. Weickert, Visualization and Processing of Tensor Fields

Today

Motivation

Crazy Workflow

  • To understand what, why and how from the moment an image is produced until it is finished (published, used in a report, …)
  • To learn how to go from one analysis on one image to 10, 100, or 1000 images (without working 10, 100, or 1000X harder)

Motivation (Why does this class exist?)

  • Detectors are getting bigger and faster constantly
  • Todays detectors are really fast
    • 2560 x 2160 images @ 1500+ times a second = 8GB/s
  • Matlab / Avizo / Python / … are saturated after 60 seconds
  • A single camera
    • More information per day than Facebook*
    • Three times as many images per second as Instagram**

X-Ray

  • SRXTM images at (>1000fps) → 8GB/s
  • cSAXS diffraction patterns at 30GB/s
  • Nanoscopium Beamline, 10TB/day, 10-500GB file sizes

Optical

  • Light-sheet microscopy (see talk of Jeremy Freeman) produces images → 500MB/s
  • High-speed confocal images at (>200fps) → 78Mb/s

Personal

  • GoPro 4 Black - 60MB/s (3840 x 2160 x 30fps) for $600
  • fps1000 - 400MB/s (640 x 480 x 840 fps) for $400

Motivation (Is it getting better?)

  1. Experimental Design finding the right technique, picking the right dyes and samples has stayed relatively consistent, better techniques lead to more demanding scientits.

  2. Management storing, backing up, setting up databases, these processes have become easier and more automated as data magnitudes have increased

  3. Measurements the actual acquisition speed of the data has increased wildly due to better detectors, parallel measurement, and new higher intensity sources

  4. Post Processing this portion has is the most time-consuming and difficult and has seen minimal improvements over the last years

plot of chunk time-figure

Saturating Output

plot of chunk scaling

Year Measurements Publications
2000 146 67
2008 584 110
2014 1031 128
2020 1081 133

To put more real numbers on these scales rather than 'pseudo-publications', the time to measure a terabyte of data is shown in minutes.

Year Time to 1 TB in Minutes
2000 4096
2008 1092
2014 32
2016 2

How much is a TB, really?

If you looked at one 1000 x 1000 sized image plot of chunk unnamed-chunk-3 every second, it would take you
139 hours to browse through a terabyte of data.

Year Time to 1 TB Man power to keep up Salary Costs / Month
2000 4096 min 2 people 25 kCHF
2008 1092 min 8 people 95 kCHF
2014 32 min 260 people 3255 kCHF
2016 2 min 3906 people 48828 kCHF

Overwhelmed

  • Count how many cells are in the bone slice
  • Ignore the ones that are ‘too big’ or shaped ‘strangely’
  • Are there more on the right side or left side?
  • Are the ones on the right or left bigger, top or bottom?

cells in bone tissue

More overwhelmed

  • Do it all over again for 96 more samples, this time with 2000 slices instead of just one!

more samples

Bring on the pain

  • Now again with 1090 samples!

even more samples

It gets better

  • Those metrics were quantitative and could be easily visually extracted from the images
  • What happens if you have softer metrics

alignment

  • How aligned are these cells?
  • Is the group on the left more or less aligned than the right?
  • errr?

Dynamic Information

  • How many bubbles are here?
  • How fast are they moving?
  • Do they all move the same speed?
  • Do bigger bubbles move faster?
  • Do bubbles near the edge move slower?
  • Are they rearranging?

Computing has changed: Parallel

Moores Law

\[ \textrm{Transistors} \propto 2^{T/(\textrm{18 months})} \]

Based on trends from Wikipedia and Intel Based on data from https://gist.github.com/humberto-ortiz/de4b3a621602b78bf90d

There are now many more transistors inside a single computer but the processing speed hasn't increased. How can this be?

  • Multiple Core
    • Many machines have multiple cores for each processor which can perform tasks independently
  • Multiple CPUs
    • More than one chip is commonly present
  • New modalities
    • GPUs provide many cores which operate at slow speed

Parallel Code is important

Computing has changed: Cloud

  • Computer, servers, workstations are wildly underused (majority are <50%)
  • Buying a big computer that sits idle most of the time is a waste of money

http://www-inst.eecs.berkeley.edu/~cs61c/sp14/ “The Case for Energy-Proportional Computing,” Luiz André Barroso, Urs Hölzle, IEEE Computer, December 2007

cloud services

  • Traditionally the most important performance criteria was time, how fast can it be done
  • With Platform as a service servers can be rented instead of bought
  • Speed is still important but using cloud computing $ / Sample is the real metric
  • In Switzerland a PhD student if 400x as expensive per hour as an Amazon EC2 Machine
  • Many competitors keep prices low and offer flexibility

Cloud Computing Costs

The figure shows the range of cloud costs (determined by peak usage) compared to a local workstation with utilization shown as the average number of hours the computer is used each week.

plot of chunk unnamed-chunk-6

The figure shows the cost of a cloud based solution as a percentage of the cost of buying a single machine. The values below 1 show the percentage as a number. The panels distinguish the average time to replacement for the machines in months

plot of chunk unnamed-chunk-7

Cloud: Equal Cost Point

Here the equal cost point is shown where the cloud and local workstations have the same cost. The x-axis is the percentage of resources used at peak-time and the y shows the expected usable lifetime of the computer. The color indicates the utilization percentage and the text on the squares shows this as the numbers of hours used in a week.

plot of chunk unnamed-chunk-8

Course Overview

Lecture Description Applications
19th February - Introduction and Workflows Basic overview of the course, introduction to the basics of images and their acquisition, the importance of reproducibility and why workflows make sense for image processing Calculating the intensity for a folder full of images
26th February - Image Enhancement (A. Kaestner) Overview of what techniques are available for assessing and improving the quality of images, specifically various filters, when to apply them, their side-effects, and how to apply them correctly Removing detector noise from neutron images to distinguish different materials
5th March - Basic Segmentation, Discrete Binary Structures How to convert images into structures, starting with very basic techniques like threshold and exploring several automated techniques Identify cells from noise, background, and dust

Overview: Segmentation

Lecture Description Applications
5th March - Basic Segmentation, Discrete Binary Structures How to convert images into structures, starting with very basic techniques like threshold and exploring several automated techniques Identify cells from noise, background, and dust
12th March - Advanced Segmentation More advanced techniques for extracting structures including basic clustering and classification approaches, and component labeling Identifying fat and ice crystals in ice cream images
19th March - Machine Learning in Image Processing (M. Jäggi and A. Lucchi) Applying more advanced techniques from the field of Machine Learning to image processing segmentation and analysis like Support vector machines (SVM) and Markov Random Fields (MRF) Training an algorithm to automatically identify cells

Overview: Analysis

Lecture Description Applications
26th March - Analyzing Single Objects The analysis and characterization of single structures/objects after they have been segmented, including shape and orientation Count cells and determine their average shape and volume
2nd April - Analyzing Complex Objects What techniques are available to analyze more complicated objects with poorly defined 'shape' using Distance maps, Thickness maps, and Voronoi tesselation Seperate clumps of cells, analyze vessel networks, trabecular bone, and other similar structures
16th April - Spatial Distribution Extracting meaningful information for a collection of objects like their spatial distribution, alignment, connectivity, and relative positioning Quantify cells as being evenly spaced or tightly clustered or organized in sheets

Overview: Big Imaging

Lecture Description Applications
23rd April - Statistics and Reproducibility Making a statistical analysis from quantified image data, and establishing the precision of the metrics calculated, also more coverage of the steps to making an analysis reproducible Determine if/how different a cancerous cell is from a healthly cell properly
30th April - Dynamic Experiments Performing tracking and registration in dynamic, changing systems covering object and image based methods Turning a video of foam flow into metrics like speed, average deformation, and reorganization
7th May - Scaling Up / Big Data Performing large scale analyses on clusters and cloud-based machines and an introduction of how to work with 'big data' frameworks Performing large scale analyses using ETHs clusters and Amazons Cloud Resources, how to do anything with a terabytes of data

Overview: Wrapping Up

Lecture Description Applications
21th May - Guest Lecture, Applications in Material Science Application of the course material to an actual scientific project from material science where we reproduce the results of a publication
28th May - Project Presentations The presentations of the student projects done in the class

What is an image?

A very abstract definition: A pairing between spatial information (position) and some other kind of information (value).

In most cases this is a 2 dimensional position (x,y coordinates) and a numeric value (intensity)

x y Intensity
1 1 91
2 1 88
3 1 45
4 1 11
5 1 56
1 2 40

This can then be rearranged from a table form into an array form and displayed as we are used to seeing images

plot of chunk unnamed-chunk-16

2D Intensity Images

The next step is to apply a color map (also called lookup table, LUT) to the image so it is a bit more exciting

plot of chunk unnamed-chunk-17

Which can be arbitrarily defined based on how we would like to visualize the information in the image

plot of chunk unnamed-chunk-18

plot of chunk unnamed-chunk-19

Lookup Tables

Formally a lookup table is a function which \[ f(\textrm{Intensity}) \rightarrow \textrm{Color} \]

plot of chunk unnamed-chunk-20

These transformations can also be non-linear as is the case of the graph below where the mapping between the intensity and the color is a \( \log \) relationship meaning the the difference between the lower values is much clearer than the higher ones

plot of chunk unnamed-chunk-21

On a real image the difference is even clearer

plot of chunk unnamed-chunk-22

3D Images

For a 3D image, the position or spatial component has a 3rd dimension (z if it is a spatial, or t if it is a movie)

x y z Intensity
1 1 1 86
2 1 1 59
3 1 1 31
1 2 1 47
2 2 1 99
3 2 1 54

This can then be rearranged from a table form into an array form and displayed as a series of slices

plot of chunk unnamed-chunk-24

Multiple Values

In the images thus far, we have had one value per position, but there is no reason there cannot be multiple values. In fact this is what color images are (red, green, and blue) values and even 4 channels with transparency (alpha) as a different. For clarity we call the dimensionality of the image the number of dimensions in the spatial position, and the depth the number in the value.

x y Intensity Transparency
1 1 92 91
2 1 64 48
3 1 48 67
4 1 47 33
5 1 77 23
1 2 27 8

This can then be rearranged from a table form into an array form and displayed as a series of slices

plot of chunk unnamed-chunk-26

Hyperspectral Imaging

At each point in the image (black dot), instead of having just a single value, there is an entire spectrum. A selected group of these (red dots) are shown to illustrate the variations inside the sample. While certainly much more complicated, this still constitutes and image and requires the same sort of techniques to process correctly.

plot of chunk load_hypermap

plot of chunk unnamed-chunk-27

Image Formation

Traditional Imaging

  • Impulses Light, X-Rays, Electrons, A sharp point, Magnetic field, Sound wave
  • Characteristics Electron Shell Levels, Electron Density, Phonons energy levels, Electronic, Spins, Molecular mobility
  • Response Absorption, Reflection, Phase Shift, Scattering, Emission
  • Detection Your eye, Light sensitive film, CCD / CMOS, Scintillator, Transducer

Where do images come from?

Modality Impulse Characteristic Response Detection
Light Microscopy White Light Electronic interactions Absorption Film, Camera
Phase Contrast Coherent light Electron Density (Index of Refraction) Phase Shift Phase stepping, holography, Zernike
Confocal Microscopy Laser Light Electronic Transition in Fluorescence Molecule Absorption and reemission Pinhole in focal plane, scanning detection
X-Ray Radiography X-Ray light Photo effect and Compton scattering Absorption and scattering Scintillator, microscope, camera
Ultrasound High frequency sound waves Molecular mobility Reflection and Scattering Transducer
MRI Radio-frequency EM Unmatched Hydrogen spins Absorption and reemission RF coils to detect
Atomic Force Microscopy Sharp Point Surface Contact Contact, Repulsion Deflection of a tiny mirror

Acquiring Images

Traditional / Direct imaging

  • Visible images produced or can be easily made visible
  • Optical imaging, microscopy

 here the measurement is supposed to be from a typical microscope which blurs, flips and otherwise distorts the image but the original representation is still visible

Indirect / Computational imaging

  • Recorded information does not resemble object
  • Response must be transformed (usually computationally) to produce an image

here the measurement is supposed to be from a diffraction style experiment where the data is measured in reciprocal space (fourier) and can be reconstructed to the original shape

Traditional Imaging

Traditional Imaging

Copyright 2003-2013 J. Konrad in EC520 lecture, reused with permission

Traditional Imaging: Model

Traditional Imaging Model

\[ \left[\left([b(x,y)*s_{ab}(x,y)]\otimes h_{fs}(x,y)\right)*h_{op}(x,y)\right]*h_{det}(x,y)+d_{dark}(x,y) \]

\( s_{ab} \) is the only information you are really interested in, so it is important to remove or correct for the other components

For color (non-monochromatic) images the problem becomes even more complicated \[ \int_{0}^{\infty} {\left[\left([b(x,y,\lambda)*s_{ab}(x,y,\lambda)]\otimes h_{fs}(x,y,\lambda)\right)*h_{op}(x,y,\lambda)\right]*h_{det}(x,y,\lambda)}\mathrm{d}\lambda+d_{dark}(x,y) \]

Indirect Imaging (Computational Imaging)

  • Tomography through projections
  • Microlenses (Light-field photography)

  • Diffraction patterns
  • Hyperspectral imaging with Raman, IR, CARS
  • Surface Topography with cantilevers (AFM)

Suface Topography

On Science

What is the purpose?

  • Discover and validate new knowledge

How?

  • Use the scientific method as an approach to convince other people
  • Build on the results of others so we don't start from the beginning

Important Points

  • While qualitative assessment is important, it is difficult to reliably produce and scale
    • Quantitative analysis is far from perfect, but provides metrics which can be compared and regenerated by anyone

Inspired by: imagej-pres

Science and Imaging

  • Images are great for qualitative analyses since our brains can quickly interpret them without large programming investements.
  • Proper processing and quantitative analysis is however much more difficult with images.

    • If you measure a temperature, quantitative analysis is easy, \( 50K \).
    • If you measure an image it is much more difficult and much more prone to mistakes, subtle setup variations, and confusing analyses

Furthermore in image processing there is a plethora of tools available

  • Thousands of algorithms available

  • Thousands of tools

  • Many images require multi-step processing

  • Experimenting is time-consuming

  • Reproducibility

    Science demands repeatability! and really wants reproducability

    • Experimental conditions can change rapidly and are difficult to make consistent
    • Animal and human studies are prohibitively time consuming and expensive to reproduce
    • Terabyte datasets cannot be easily passed around many different groups
    • Privacy concerns can also limit sharing and access to data
    • Science is already difficult enough
    • Image processing makes it even more complicated
    • Many image processing tasks are multistep, have many parameters, use a variety of tools, and consume a very long time

    How can we keep track of everything for ourselves and others?

    • We can make the data analysis easy to repeat by an independent 3rd party

    Soup Example

    Easy to follow the list, anyone with the right steps can execute and repeat (if not reproduce) the soup

    Simple Soup

    1. Buy {carrots, peas, tomatoes} at market
    2. then Buy meat at butcher
    3. then Chop carrots into pieces
    4. then Chop potatos into pieces
    5. then Heat water
    6. then Wait until boiling then add chopped vegetables
    7. then Wait 5 minutes and add meat

    More complicated soup

    Here it is harder to follow and you need to carefully keep track of what is being performed

    Steps 1-4

    1. then Mix carrots with potatos \( \rightarrow mix_1 \)
    2. then add egg to \( mix_1 \) and fry for 20 minutes
    3. then Tenderize meat for 20 minutes
    4. then add tomatoes to meat and cook for 10 minutes \( \rightarrow mix_2 \)
    5. then Wait until boiling then add \( mix_1 \)
    6. then Wait 5 minutes and add \( mix_2 \)

    Using flow charts / workflows

    Simple Soup

    plot of chunk unnamed-chunk-31

    Complicated Soup

    plot of chunk unnamed-chunk-32

    Workflows

    Clearly a linear set of instructions is ill-suited for even a fairly easy soup, it is then even more difficult when there are dozens of steps and different pathsways

    plot of chunk unnamed-chunk-33

    Furthermore a clean workflow allows you to better parallelize the task since it is clear which tasks can be performed independently plot of chunk unnamed-chunk-34